618 results found.
Speech/Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
CreativeCommons
Size:
62 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Stefan Riezler | LibriVoxDeEN | /N |
Documentation:
English documentation
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Chinese Czech English Finnish French German Hindi Indonesian Italian Japanese Korean Polish Portuguese Russian Spanish Swedish Thai Turkish
Availability:
Freely Available
License:
CC-BY-SA
Size:
300 KByte Production Status:
Newly created-finished
Use:
Emotion Recognition/Generation
-
Paper title:How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hiroshi Kanayama | Parallel Sentiment | /N |
Documentation:
For 19 languages (ar,cs,de,en,es,fi,fr,hi,id,it,ja,ko,pl,pt,ru,sv,th,tr,zh)
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
CC-BY-SA 3.0
Size:
1,600,000 tokens Production Status:
Existing-used
Use:
Document Classification, Text categorisation
-
Paper title:Automatic Orality Identification in Historical Texts
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Stefanie Dipper | Deutsches Textarchiv (DTA) | /N |
Documentation:
Publicly available documentation in German
Modality Independent
Lexicon,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
CreativeCommons
Size:
101509 entries Production Status:
Newly created-finished
Use:
Natural Language Generation
-
Paper title:MucLex: A German Lexicon for Surface Realisation
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Daniel Braun | MucLex | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
Freely Available
License:
CC BY 4.0
Size:
2157048 tokens Production Status:
Newly created-finished
Use:
Named Entity Recognition
-
Paper title:A Dataset of German Legal Documents for Named Entity Recognition
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Georg Rehm | Legal Entity Recognition (LER) | /N |
Documentation:
https://github.com/elenanereiss/Legal-Entity-Recognition
Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
From Data Center(s)
License:
Query and Analysis only, CC-NC
Size:
46900000000 words Production Status:
Existing-updated
Use:
Corpus Linguistics
-
Paper title:RKorAPClient: An R Package for Accessing the German Reference Corpus DeReKo via KorAP
-
Paper track:Written/poster presentation with demo
-
Paper status:Accept Poster+Demo
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Marc Kupietz | German Reference Corpus DeReKo | /N |
Documentation:
http://www.dereko.de
Speech/Written
Corpus,
Language Type:
Monolingual
Languages:
German
Availability:
License:
Size:
None OtherProduction Status:
Use:
Corpus Creation/Annotation
-
Paper title:Annotation of Emotion Carriers in Personal Narratives
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Aniruddha Tammewar | Ulm State-of-Mind in Speech (USoMs) | /N |
Documentation:
None
Written
Representation-Annotation Formalism/Guidelines,
Language Type:
Monolingual
Languages:
German
Availability:
License:
Size:
None Production Status:
Use:
Parsing and Tagging
-
Paper title:A Penn-style Treebank of Middle Low German
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hannah Booth | TIGER Annotation scheme | /N |
Documentation:
None
Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
Creative Commons Attribution-ShareAlike 3.0
Size:
38,000 annotated full-text documents OtherProduction Status:
Existing-used
Use:
Evaluation/Validation
-
Paper title:Is Language Modeling Enough? Evaluating Effective Embedding Combinations
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rudolf Schneider | WikiSection | /N |
Documentation:
https://github.com/sebastianarnold/WikiSection in english
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None




